Boosting OCR Accuracy Using Crowdsourcing
نویسندگان
چکیده
Book digitizing is an important work in preserving ancient heritages. However, digitizing books contains a series of labor-intensive works, and one of them is to verify optical character recognition (OCR) outcomes. In this paper, we propose a crowdsourceable OCR verification method. Using our method, content holders are able to leverage the power of crowds to complete verification tasks and avoid content leakage. From the experiment results, our method is more efficient and reliable than the traditional method.
منابع مشابه
Crowdsourcing an OCR Gold Standard for a German and French Heritage Corpus
Crowdsourcing approaches for post-correction of OCR output (Optical Character Recognition) have been successfully applied to several historic text collections. We report on our crowd-correction platform Kokos, which we built to improve the OCR quality of the digitized yearbooks of the Swiss Alpine Club (SAC) from the 19th century. This multilingual heritage corpus consists of Alpine texts mainl...
متن کاملBoosting Optical Character Recognition: A Super-Resolution Approach
Text image super-resolution is a challenging yet open research problem in the computer vision community. In particular, low-resolution images hamper the performance of typical optical character recognition (OCR) systems. In this article, we summarize our entry to the ICDAR2015 Competition on Text Image Super-Resolution. Experiments are based on the provided ICDAR2015 TextSR dataset [3] and the ...
متن کاملDigitalkoot: Making Old Archives Accessible Using Crowdsourcing
In this paper, we present Digitalkoot, a system for fixing errors in the Optical Character Recognition (OCR) process of old texts through the use of human computation. By turning the work into simple games, we are able to attract a great number of volunteers to donate their time and cognitive capacity for the cause. Our analysis shows how untrained people can reach very high accuracy through th...
متن کاملPerform Three Data Mining Tasks with Crowdsourcing Process
For data mining studies, because of the complexity of doing feature selection process in tasks by hand, we need to send some of labeling to the workers with crowdsourcing activities. The process of outsourcing data mining tasks to users is often handled by software systems without enough knowledge of the age or geography of the users' residence. Uncertainty about the performance of virtual user...
متن کاملLearning to Scale Payments in Crowdsourcing with PropeRBoost
Motivating workers to provide significant effort has been recognized as an important issue in crowdsourcing. It is important not only to compensate worker effort, but also to discourage low-quality workers from participating. Several proper incentive schemes have been proposed for this purpose; they are either based on gold tasks or on peer consistency in individual tasks. As the rewards cannot...
متن کامل